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Successful Spam Filtering 

Jeffrey Fulmer 


E mail is an effective and inexpensive collaboration tool. 
Since Ray Tomlinson's @ sign helped specify user¬ 
name at host computer, electronic mail has become an 
integral part of our lives. Today nearly 60 billion messages are 
sent on a daily basis. Of that total, more than 60 percent can be 
classified as spam. Productivity gains derived from email are 
offset by this nuisance. 

Electronic junk mail is damaging to both mail systems 
and employee productivity. Nearly every enterprise is 
affected by it. yet. according to Gartner Research, only 10 
percent have spam-filtering technologies in place. This lapse 
is not for lack of filtering technologies; there arc many prod¬ 
ucts trom which to choose. Filtering solutions that fail to 
consider business requirements, however, will not succeed. 
This article will examine "best practices tor a successful 
implementation. 

Any systems administrator who has participated in a pro¬ 
ject in which large numbers of end users are affected under¬ 
stands that spam filtering should not be taken lightly. It 
affects nearly every computer user in the company. The 
risks are high, but so are the rewards. II you reduce daily 
mailbox maintenance by 5 minutes for each of 1000 
employees, then you will save the company the cost of 
about 8 average salaries. Those savings don’t include band¬ 
width reduction and damage prevented by virus quarantine. 

I he line is tine. You can oilset savings by deleting a time- 
sensitive business contract or an important sales lead. One 
such an occurrence could be enough to derail your entire fil¬ 
tering project. For this reason, it is necessary to examine the 
enviionment and the culture into which you plan to intro¬ 
duce spam filtering. 

Requirements Gathering 

Like any business endeavor, communication is the key to 
success. From the onset, you must engage business units in 
order to fit your filter into the enterprise. Their feedback is 
\ital. Explain the effort and its importance to the companv. 
Feu people like spam, so this is not a hard sell. If they desire 
your project, then it is more likely to succeed. Once they 
agree to combat spam, explain your proposal in detail. Then 
he prepared to listen. 

Concerns will arise as details become apparent. New tech¬ 
nologies are often greeted with suspicion, and this one is no 
exception, When usets learn that private conversations will be 
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filtered and manipulated, you will have their complete atten¬ 
tion. Some employees will invoke Big Brother and sow seeds 
ol discontent. Most will worry about messages that never stet 
delivered. Envision this meeting before it occurs. When you 
select a technology for implementation, realize that it has to be 
flexible. Many of the questions you encounter will be non- 
negotiable requests. Can a particular address always so 
through? Can you let all my messages through? The more 
legitimate concerns you accommodate, the better your chances 
for acceptance. 

Nothing will scrap your solution quicker than business 
disruptions. If new contacts are deleted, if timely informa¬ 
tion is quarantined, then immediate backlash will occur. The 
^ thought ol such interruptions gives people reason to 
pause. If you break business functionality, then management 
may scrap your system or scale it back so that it's rendered 
meaningless. During these sessions, you must gather infor¬ 
mation tci build a comprehensive picture. Which domains 
should be whitelisted? Mail from some companies should 
always go through. Does your daily terminology match a 
pattern of spam? For example, while Spam Assassin has a 
file dedicated to pornography keywords, it has another 
devoted to medicinal drugs. Its keyword phrases file con¬ 
tains a large section on low-cost loans and credit cards. 
Administrators in the pharmaceutical or financial industries 
need to he sensitive to this reality before implementation. 
The more tuning you can perform in advance, the less pain 
you'll suffer later. 

Few groups will be allected more by this project than the 
help desk. They might receive calls before implementation. 
Once users learn that email might be filtered for spam, they 
may call the help desk at first sign of a problem. If a user dis¬ 
cusses a filter with a help desk operator who knows nothing 
about it. contusion will reign. A sustained increase in help 
desk tickets will undermine gains from spam reduction. A 
carefully planned implementation will alleviate this problem. 
Make sure help desk personnel are aware of the project from 
the onset. "The spam filter hasn't been installed yet; I'll open a 
ticket with the mail administrator. Once the project is imple¬ 
mented. you can reduce calls by automating as much as possi¬ 
ble. Allow people to view quarantined mail before it expires. A 
Web inteilace that provides user-driven whitelisting will make 
everyone s life easier. The more control you can put on the 
desktop, the fewer calls and tickets you will generate. 
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Preparation 

There are two key elements to a successful filtering imple¬ 
mentation — effective communication, and gradual assimila¬ 
tion. Business units are engaged in order to gather necessary 
requirements. Supplied with appropriate information, software 
selection is easy. The administrator needs to simply select a 
package to meet the requirements. If you work in a shop that 
has not fully embraced open source technology, then a thor¬ 
ough list of requirements could help you make the case lor a 
particular software package. A comparison of features and 
requirements will help demonstrate its appropriateness. 

While this article is focused on a successful implementa¬ 
tion. one issue must be addressed with regard to software 
selection. To adequately evaluate a product, it is important to 
understand the methods used to produce the false positive 
and false negative numbers the vendor advertises. If false 
positives are tracked as a percentage of all mail rather than 
as a percentage of mail that was flagged as spam, then the 
vendor’s numbers will be considerably lower than actual 
user experience. The discrepancy will be compounded it you 
rely on the vendor’s unrealistic data when you introduce the 
project to business units. 

To alleviate concern for business disruption, a good filter¬ 
ing system should offer several levels of granularity. Lsers 
should be provided with a mechanism to opt out of the process 
entirely. An automatic process allowing users to whitelist 
important email addresses or domains reduces administrative 
overhead and engages users. The more options you provide, 
the more empowered they'll feel. If your solution is another 
tool in their box. then they'll be more likely to accept this new 
technology and assist its implementation. 

No amount of preparation is going to prevent false posi¬ 
tives; legitimate messages will be flagged as spam. When the 
filter is first installed, test it in “audit” mode. During this 
phase, messages are filtered and scored, data is collected, 
reports are generated but no messages are deleted or quaran¬ 
tined. The administrator should try to paint a comprehensive 
portrait of enterprise spam. The idea is to establish score 
ranges for obvious spam, likely spam, and legitimate messages 
(ham). As you gather statistics, continue to tune the filter. With 
careful adjustment, you are ready lor the next step — spam 


reduction. 

With reports and statistics in hand, work with business units 
to establish thresholds for quarantine and automatic deletion. 
It’s a good idea to initially set thresholds higher than your level 
of confidence. In time you can ease them down to match your 
statistical analysis. A gradual implementation will help 
increase the comfort of others. Some business units may opt 
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out of the system all together. None of their mail should ever 
be stopped. Make sure the system meets their requirements. 
Now, there is just one more step before implementation. 

All good systems require good documentation, and this is 
no exception. Provide a detailed description ot the system and 
its processes. Describe how to view and retrieve quarantined 
mail. Users should know how to append addresses and 
domains to a whitelist. Some administrators send mail to the 
client with spam analysis embedded in the message. Provide 
instructions that will enable a user to filter messages in theii 
email client based on that information. A desktop administra¬ 
tor might be able to push a configuration to every desktop. A 
client-side mailbox could serve as a primary or secondary 
quarantine area. Simply because some users have opted out of 
the system doesn't mean they should be excluded. Your docu¬ 
mentation should provide instructions on how to avoid spam. 
Many times users don’t know w-hat they did to start the deluge. 

Implementation 

What good is spam filtering if you can’t demonstrate how 
well it works? Make sure you have a reporting mechanism in 
place before implementation. At minimum, it should measure 
total volume by day, month, and year, as well as the top out¬ 
bound and inbound senders. Once filtering is in place, the 
report should capture statistics about the spam itself How 
many messages were flagged as spam and how many passed as 
legitimate? For what reasons were messages rejected? The 
more data you have, the better you’ll be able to react to it. 
What is the effect of moving a spam threshold two-tenths of a 
point? An adequate report will provide the answer. 

A aood administrator is careful and methodical. Okay, he’s 
paranoid. If you can t bring yourself to dump obvious spam 
into /dev/null, then send it to another mailbox. If you have suf¬ 
ficient disk storage, cycle the spam boxes with a log rotation 
program. Retain each daily spam box for x number of days, 
then send it to /dev/null. This will provide a buffer from which 
to recover false positives. If space is lacking, daily files can be 
off-loaded to another server for rotation. 

Remember, gradual assimilation is one of the keys to suc¬ 
cess. Anybody can reduce spam by 60% in a single month. It s 
better to achieve that in 6 months to a year without interruption 
to the business. Spam filtering requires continued administra¬ 
tion. Unlike Ron Poped, you can’t just “set it and forget it.” 
Matt Cramer is an Information Security Architect in a large 
multi-national enterprise; he implemented a spam filtering sys¬ 
tem 2 years ago. During that time, email volume increased 
150% while Cramer was able to reduce unwanted email by 
93%. Legitimate email messages are flagged as spam just one- 
hundredth of one percent of the time. He never stopped tuning 
the configuration to meet real-world conditions. 

Cramer provides some advice for aspiring spam hunters. 
“Filtering needs to strike a balance — specific to your enter¬ 
prise _between false positives and false negatives (missed 

spam). Whether you build your own filters or purchase a com¬ 
mercial one. it is important for an enterprise to understand 
what the business will tolerate for these values, because no 
spam filter will be perfect." 

Jeffrey Fulmer has administered enterprise computer systems profes¬ 
sionally since 1995. He is an open source software developer and the 
primary author of siege. He currently resides in Pennsylvania with his 
wife and English bulldog. 
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Using New Features in SpamAssassin 3.0 

Robert Hoskins and Dole Nielsen 


I n this article, we will look at the new features that have 
been added to SpamAssassin (SA) 3,0 and show you how 
to use them. We’ll start by giving an overview of the 
changes in SpamAssassin 3.0 and then show an example 
upgrade of an existing SpamAssassin 2.64 installation. 

What’s New in SA 3.0 

There are a number of changed functionality anti new fea¬ 
tures in SpamAssassin 3.0. These modifications include: 

• License change 
• API changes 
• New spam filtering rules 
* Database and LDAP changes 
• Network chances 
* New plug-in framework 

Each of these areas is covered briefly below. 

License Change 

Note that we are not lawyers, so you should speak to an 
attorney if you have questions regarding licensing issues. 
Before version 2. SpamAssassin was licensed under the GNU 
Public License (GPL). However, with SpamAssassin moving 
to the Apache Foundation, it is now covered by the Apache 
Software Foundation License (ASF). If you are simply using 
the SpamAssassin software in the operations of your network, 
there is no need to worry. However, if you are a developer and 
want to incorporate SpamAssassin with a GPL-based software 
package, then you may have a problem because the ASF 
license may not be compatible with the GPL. The ASF license 
is. however, approved by the Open Source Initiative (OS1). 

API Changes 

There have been significant changes in the application 
programming interface (API) for SA 3.0. Unless you are a 
developer, the changes to the SpamAssassin API mostly 
affect the programs used to integrate SpamAssassin into your 
mail transfer agent (MTA), such as amavisd-new and 
MIMEdefang. To run SpamAssassin 3.0 successfully, you 
should be running the following versions of SpamAssassin 
MTA integration software at a minimum: 

Amavisd-new: amavisd-new-20030616-p8 (2.2.1 is latest) 
MIMEdefang: 2.42 and higher (2.49 is latest) 

Qmail-Seanner: 1.23 and higher 1 1.23 is latest) 


As with most open source software, "later versions are better”. 
So if you have a choice, go with the latest stable version of 
the MTA integration software. Of course, the internal 
SpamAssassin components (such as spa me) have been 
updated, so if you're using proemail to integrate with your 
MTA. you don't have to do anything. 

New Spam Filtering Rules 

A number of new rules are distributed with SpamAssassin 
3.0 by default. SpamAssassin 3.0 distributes a total of 937 
rules, and SpamAssassin 2,6 distributes 601 rules, making the 
difference 336 additional rules in 3.0, However, some of this 
difference may result from deleted rules rather than additional 
rules. In addition to the changed rules, a number of the default 
scores have changed, as has been the case in the past with new 
SA versions (even minor upgrades). 

Database and LDAP Changes 

The most significant change in the database and LDAP sup¬ 
port is the ability to store user preferences in a MySQL or 
Postgres database or an LDAP store. This is a major develop¬ 
ment for any larger site that would like to simplify their 
SpamAssassin setup by placing all of their user information 
into a database. Also, the Bayesian information (tokens, 
scores) can be placed into a database or directory. Benefits of 
doing this include: 

* Ability to store preferences, Bayesian scores, and auto- 
whitelist information for users who have no home directories 
on the server running SpamAssassin 

• Ability lor the end user to easily manage SpamAssassin 
settings by implementing a graphical user interface to 
database/di rectory 

For smaller sites (10-20 users) that don't have LDAP or a 
database infrastructure already in place, it might not be 
worthwhile to deploy a database or directory to get these ben¬ 
efits. However, for larger sites or any site with an existing 
database or directory infrastructure, it is probably worthwhile 
to implement these features. 

Network Changes 

Previous versions of SpamAssassin enabled the user to 
identity trusted networks. SpamAssassin 3.0 provides the abil¬ 
ity to further identify trusted networks. Specifically, the idea of 
internal networks" is identified. These are machines that are 
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internal mail relay machines or MX relay hosts tor your 
domains. This list is used to detect spammers who send their 
garbage directly to MX (or backup MX) hosts. Mail relay 
machines that accept mail directly from dial-up hosts or high¬ 
speed DSL/cable modem-connected clients should not be 
placed on the internal network lists. Instead, they should be 
placed only on the "trusted networks" list. 

New Plug-in Framework 

This version of SpamAssussin has the ability for developers 
to extend the functionality by using plug-ins. This will enable 
interested parties, either open source or commercial, to easily 
extend Spam Assassin's abilities as they wish. Spam Assassin 
3.0 is distributed with the following four plug-ins; 

• Hashcash 

• RelayCountry 

• Sender Policy Framework (SPF) 

• URIDNSBL 

In the Hashcash scheme, senders include proof of spent CPU 
time in order to compute a value as an indication that they are 
not spammers. Including an acceptable hashcash value will 
low er the Spam Assassin score for the message. 
RelayCountry enables the Spam Assassin user to utilize a new 
geographic-based token, which identifies the mail servers 
through which the message passes on its way to the recipient. 
SPF implements the Sender Policy Framework checks on the 
sender’s domain. SPF can be thought of as reverse mail 
exchange (MX) records that define which IP addresses are 
allowed to originate email for a domain. URIDNSBL gives 


SpamAssussin the ability to check the body of the message 
for spammer-related URLs and help identify spam messages 
by adjusting the score appropriately. 

Upgrading from SA 2.x 

The balance of this article concerns upgrading a 
SpamAssussin 2.x installation to 3.0.2. We used Gentoo Linux 
version 2004.3 (with all updates applied as of January 21, 
2005) as the platform for our examples. We cannot possibly 
cover all the potential permutations of SpamAssussin configu¬ 
rations. Thus, for the purposes of the upgrade coverage in this 
article, we made the following assumptions: 

• Initial installation of Spam Assassin was version 2.64 

• Per-user invocation of SpamAssassin by spamc/spamd and 
procmail version 3.22 

• Postfix version 2.1.5 

• Have installed SA files in their default locations 

• Have sudo or root access to the machine 

Other versions of software we used included: 

• Perl 5.8.5 

• MySQL 4.0.23 

• OpenLDAP 2.1.30 

The steps to upgrade SpamAssassin are as follows: 

1. Download and build SpamAssassin 3.0.2. 

2. Shut down spamd and Postfix. 

3. Synchronize the old SA 2.64 Bayesian journals. 





Appro 


Servers 


HPC Cluster Solutions 


Appro has everything you need to create a network blade cluster-ready 
On-site maintenance and installations services are also available. 

For more information, please visit www.appro.com 
or call Appro Sales at 800.927.5464, 408.941.8100. 


el, Intel Inside, Intel Inside logo, and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. 


■ Flexible, modular & scalable 
architecture 

■ Blade servers fit all cabinet sizes 

m Dual Intel® Xeon™ 
processors per node 

W Up to 16GB ECC DDR 
memory per node 

m One PCI-X slot per node 

m Up to two interna! HDDs 

m Remote Management Options 

■ High-Speed Interconnect 
Options: Myrmet, Dolphin, 
Quadrics and Infiniband 


m Design to run cool and efficiently in 
demanding rack conditions 


Full-Cluster 
Up to SO 


■ High quality, powerful and cost-effective 


Appro Remote Server 
Management Solution 


Mini-Cluster 

to 17 nodes 


Mid-Cluster 

50 nodes 


4 — Sys Admin 


www.sysadminmag.com 


Spam Supplement 2005 








































4. Back up the old SpamAssassin 2.64 installation. 

5. Install new SpamAssassin v3.0.2. 

6. Upgrade the old SA 2.64 Bayesian journals to the new 
3.0 format. 

7. Start up spamd and Postfix and test. 

Download and Build SA 3.0.2 

Download the SA tar files from one of the SpamAssassin 
Apache Software Foundation Web site mirrors like this: 

bash$ wget \ 

http://apache.roweboat.net/spamassassin/source/ \ 

Mail-SpamAssassin-3.0.2.tar.gz 


Consult the References section for a pointer to the complete 
list of ASF mirrors. Next, build SA like this (output from 
build scripts/commands has been deleted): 

bash$ tar xvf Mail-SpanAssassin-3.0.2.tar.gz.z 
bash$ cd Mai 1-SpamAssassin-3.0.2 
bash$ perl Makefile.PL 
bash$ make 

You have built SpamAssassin-3.0.2 and now can move on to 
the next step. Shut down spamd and Postfix, etc. This is 
accomplished by executing the following commands: 

bash$ sudo /etc/init.d/postfix stop 
Stopping postfix... [ ok ] 

bashj sudo /etc/init.d/spamd stop 
Stopping spamd... [ ok ] 

Synchronize Bayesian Journals 

This step may or may not be necessary, but it is easy to per¬ 
form and doesn’t harm anything if the users on your system 
are not using journaling. Some users choose to use journaling 
with the Baysian configuration for performance purposes. 
Journaling causes each Bayesian-related change to be simply 
journaled and then, at the end. the journal is synced into the db 
files. If journaling is going on. then this step is required to 
make sure stull in the journals gets written out before you 
upgrade the Bayesian DB. 

To accomplish this task, we have written a small shell 
script called syneJournal-2.64.sh to perform this work. The 
script assumes that SA users can be identified by having a 
.spamassassin directory under their home directories: 


#! /bin/sh 
PATH=/bin:/usr/bin 
users-'awk *F: \ 

'{ if (system( "test ! -d " $6 "/.spamassassin")) print $1; )' \ 
/etc/passwd’ 

for user in $(users} ; do 

echo "syncing journal for ${user}" 
su ${user) -c 'sa-learn --rebuild’ 
done 


To run the script, simply invoke it like this: 


bash$ sudo ../syncJournal-2.64.sh 
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Back Up the Old SpamAssassin 2.64 
Installation 

This step doesn't need to be performed per se. but should be 
in case you want to go back to your old SpamAssassin 2.64 
configuration. The backup-2.64.sh script presented here takes 
every system file or directory (but not user tiles) from the SA 
2.64 default installation locations and renames them to the 
same location. The files are given the same filenames but have 
“-2.64" appended to the end. Please note that the Perl version 
is set on the second line of the script. If you are running some¬ 
thing other than Perl 5.8.5. please adjust this line accordingly: 

#! /bin/sii 
pf*rlVersion=5.8.5 

m /etc/mail/spamassassin /etc/mai1/spamassassin-2.64 
mv /usr/bin/sa-learn /usr/bin/sa-1earn-2.64 
rnv /usr/bin/sparnassassin /usr/bin/spamassassirs-2.64 
mv /usr/bin/spamc /usr/bin/spamc-2.64 
mv /usr/bin/sparnd /usr/bin/spamd-2.64 

cp -p /usr /1 ib/perl5/${perl Version}/i686-1 inux/perllocal.pod \ 
/usr/1ib/perl6/1{perl Version}/!686-1 inux/perllocal.pod - £.64 

mv 

/usr/1ib/perl5/site_perl/${perl Vers ion]/Pail/SpamAssassin.pm \ 
/usr/1 ib/perl5/site_perl/${perl Version]/Mail/SpamAssassin.pm-2.64 
mv /usr/1 ib/perl 5 /site_perl/${perlVersion]/Mail/SpamAssassin \ 
/usr/ 1 ib/perl5/site_perl/$[perl Vers ion]/Mail/SpamAssassin-2.64 

mv \ 

/usr/1 ib/perl 5/si tejerl /${perl Version)/i686-'Miiux/auto/Mai 1 /SpamAssassin \ 

/usr/1 itpperl 5/si te_perl/$(perl Version 1 /i 686-1 i nux/auto/Mai '■ '-'SpamAssassi n-2.64 
mv /usr/share/man/manl/sa-learn.l /usr/share/man/manl/sa-1 earn.1*2.64 
mv /usr/share/man/manl/spamassassin,] \ 
/usr/share/man/manl/spamassassin.1-2.64 
mv /usr/share/man/manl/spamc.l /usr/share/man/manl/spamc.1-2.64 
mv /usr/share/man/manl/spamd.l /usr/share/man/manl/spamd.1-2.64 

for file in /usr/share/man/man3/Mai1::SpamAssassin* ; do 
mv $ {ft 1e} ({file}-2.64 
done 

mv /us' , /share/spamassassin /usr/share/spamassassin-2.64 

Install SpamAssassin v3.0.2 

Just run a make Install in the Spam Assassin-3.0.2 instal¬ 
lation directory like this: 

bash$ sudo make install 

This will install SpamAssassin in the correct locations on your 
system. 

Upgrade the SA 2.64 Bayesian Journals to 3.0 

To keep our Bayesian history, we must import the 
SpamAssassin 2.64 Bayesian journals for each user to the new 
SA 3.0.2 installation. We have written the following script for 
this purpose, and it s called syncJournalo.0.2.sh. It takes every 
user in the system with a .spamassassin directoiy and converts 
the Bayesian journal to the new 3.0 format; 


#! /bin/sh 
PATH=/bin:/usr/biii 
users-’awk -F: \ 

’{ if (systemt "test ! -d " $6 "/.spamassassin")) print $1; }’ \ 

/etc/passwd ‘ 

for user in $(users} ; do 

echo "syncing journal for ${user}" 
su ${user} -c 'sa-learn --sync’ 
done 

After the upgrade to SpamAssassin 3.0.2. place the 
sync.loLirnal-3.02.sh script in the directory above the SA 
installation directory and run it like this: 

bash! sudo ../syncJournal-3.0.2.sh 

Note that the script will output an error for every user on the 
system, similar to this: 

bayes: bayes db version 2 is not able to be used, aborting! at 
/usr/1ib/perl5/site_perl/5.8.5/Mai1/SpamAssassin/BayesStore/ 

DBM.pm line 160. 

These errors can safely be ignored. 

Start spamd and Postfix and Test 

Finally, we need to start spamd and Postfix and test the 
installation to make sure everything is working as expected. To 
do this, just run the startup scripts for spamd and Postfix like 
this: 

bash! sudo /etc/init.d/spamd start 
Starting spamd... [ ok ] 

bash$ sudo /etc/init.d/postfix start 
Starting postfix... [ ok ] 

To test, we run two test messages through SpamAssassin that 
are included as part of the SA 3.0.2 distribution. One is a regu¬ 
lar test message, and the other is a message that contains the 
special Spam Assassin test called GTLBE (short for 
Guaranteed To be Unsolicited Bulk Email). 

Non-Spam Test 

The sample non-spam message distributed by 
Spam Assassin is located in the top-level Spam Assassin-3.0.2 
directory as “sample-nonspam.txf. This is an excellent test 
message as it exhibits many characteristics that SpamAssassin 
looks for in a message, such as multiple URLs and spaces 
between letters (e.g.. “Q u o t e O f T h e M o in e n t”). Simply 
email the sample non-spam message via your Mozilla 
Thunderbird email client (if you use the Spam Assassin system 
as your mail relay). Or. make your current directory the 
SpamAssassin top-level directory and execute a mail com¬ 
mand like this: 

basht sendmail you@yourdomain.com ( ./sample-nonspam.txt 

Replace you@yourdomain.com with an account name on the 
machine running SpamAssassin. The message should make it 
through to your inbox. 


6 — Sys Admin 


www.sysadminmag.com 


Spam Supplement 2005 




Spam (GTUBE) Test 

The test spam message is distributed as sample-spam.txt in 
the top-level Spam Assassin-3.0.2 directory. As with the non¬ 
spam message, send this message to yourself from your GUI 
email client outside your network. Alternatively, you can send 
the message from the machine running Spam Assassin using 
the following command line: 

bash$ sendmai7 you@yourdomain, com < ./sample-spam.txt 

The message should be identified as a spam message by 
Spam Assassin and be disposed of accordingly. 
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Cleaning Up Large Mailing Lists: 
Removing Bad Addresses 

Jeff Bennett 


S upporting corporate Web sites, especially retail ones, 
often includes administering mail servers that perform 
regular mailings to large customer mailing lists. 
Managers are generally motivated to increase the size of these 
lists by any means necessary, as this is a good way to increase 
the customer base. In theory, the more email addresses you 
mail to. the more customers you have. In practice, this often 
means "encouraging" the people who visit your Web site to 
resister before they can continue onto more desirable site func¬ 
tions — a practice not well loved by all end users. 

The Problem 

While the marketing department of any firm may be pleased 
to have accumulated a mailing list of 500,000 email addresses 
for their weekly mailing, this joy is not always shared by the sys¬ 
tems administrator. Besides the minor burden of efficiently man- 


nonexistent or inactive email address. It may be quick and easy to 
resort to sendmail's maillog to try to determine which addresses 
are bad. but not enough information is presented there to make an 
intelligent decision. Sendmail itself has guidelines tor managing 
bounces (see Costales. Setuinioil, 3rd Ed., pp. 516-51 7). but when 
it comes to customized extraction to discover bad addresses, you 
are on vour own. I have tound that an efficient and minimally 
complex way to prune a mailing list is via a script that crawls 
through the bounces and extracts the had addresses, so thal they 
can be removed from your company’s customer list. 

The best way to ensure that you are purging addresses from 
a list with some certainty that they are actually ‘'bad” is to use 
the SMTP or Enhanced SMTP (ESMTP) error codes (RFC 
1893), and these exist only in the mail header of the bounced 
mail. Of the many different ESMTP codes that exist, these arc 
the ones I deemed to represent a bad (i.e.. nonexistent) address: 


ailing a huge send every week, the main headache is the fact that 
many of the addresses collected will be bogus. When an 
annoyed Web surfer inputs “bobsyeruncle®nodomain.all" as 
his email address in the registration process, the effect of one 
bad address added to the mailing list is less than minimal. But 
what if 10,000 people do that each week? 

There's is no limit to the imagination that goes into creating 
bogus email addresses: however, the amusement wears oft 
quickly when they start clogging up your system. Besides 
lengthening the duration of your sends, the bounces come back 
by the thousands, and the stern emails start to arrive tiom othei 
postmasters, both automated and human, informing you that 
rules are being broken. Even if you are emailing only to 
solicited customers, you may find yourself on spam lists and 
blacklists if you are sending thousands of phantom messages 
that do nothing more than lake up bandwidth and machine 
resources as they are processed and passed back and forth. 

The responsibility of the sys admin in this situation is to 
clean up the mailing list periodically. While the marketing 
team may not be thrilled to hear that there are 80.000 or so bad 
addresses in the current mailing list, they should recognize the 
necessity of cleaning them out. You can soften the blow a little 
by running your cleanup script more frequently, depending on 
the rate at which the master mailing list increases in size, and 
the rate at which your send performance degrades. 

The Solution 

The criteria for weeding out "bad" addresses should not be 
taken lightly. There are many reasons for a message to fail in 
reaching its destination, and many of them are not indicative of a 


5.0.0 Service unavailable (this is equal to an SMTP 554 
protocol error, recipient address rejected) 

5.1.1 User unknown 

5.1.2 Host unknown 

5.1.3 Domain not allowed 

5.1.6 Destination address (user) unknown 
5.1.8 User unknown 

The criteria for an address cleanup may vary depending on the 
size and type of send being done. For example. I thought about 
purging addresses for which I got the 5.2.1 "mailbox disabled" 
or 5.2.2 “mailbox is full" errors for three consecutive weeks, 
but this would require a completely separate process and a new 
script — I put this on my to-do list as a future enhancement ot 

this project. 

Having determined mv desired result (the puiging of 
addresses that caused mail to bounce with the aforementioned 
error codes), 1 faced the task of searching the mail file of the 
recipient of the bounces for two things: 

1. The pieces of mail with the error codes in question, and 

2, The original destination address that caused this eiior. 

A bounced piece of email consists ot the original mail in its 
entirety (original header and all), with a new headei attached 
during the return process. It is this new header that provides the 
reason for the return along with a tew other pieces ot informa¬ 
tion. A simple mail header for a successful sent message looks 
something like this: 
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Received: from mai1 host.relaydomain.com (mailhost.relaydomain.com 
[192.192.192.x]) by mai1 host.immense-isp.com ( 8 . 8 . 5 / 8 . 7.21 with 
ESMTP id AAA34567 for <yourbuddy@imrnense-i 5 p.COITi>; Tue, 18 Sep 
2004 14:39:24 -0800 (PST) 

Received: from mailhost.yourdomain.com (mailhost.yourdomain.edu 
[124.124.124,x]) by mailhost.relaydomain.com ( 8 . 8 . 5 ) id BBB123; 
Tue. Sep 18 2004 14:36:17 -0800 (PST) 

From: you@yourdomain.com 
To: yourbuddy@immense-isp.com 
Date: Tue, Sep 18 2004 14:36:14 PST 

Message-Id: <you033456712345-00000123@mailhost.yourdomain.com > 
Subject: Lunch today? 

MIME-Version: 1.0 
Content-Type: multipart/report: 

The header of a bounced piece of mail is considerably more 
complex and lengthy, and so much of ii is irrelevant that 
including a hundred-line example here would not add any clar¬ 
ity to our task. Suffice to say that as each host handles a mes¬ 
sage (and it may be passed around a bil before it finds its way 
back to you), ii adds header info. Included in the superfluous 
information may be content comments, mail program informa¬ 
tion. and automatically inserted comments from the postmaster 
ot any of the hosts. Often there will he multiple dues as to why 
this piece of mail has returned to your mailbox, but only one 
line is necessary, and this is not duplicated by any host other 
than the one that rejects the mail: 

Status: 5,0.0 

Now that I've gotten to the crux of the matter, it should just be 
a simple grep command to find the "Status: *' line and we’re all 
set. right? Close, but there are still a couple of things to do. 
Fortunately, the aforementioned status line always comes with 
a couple of preceding lines attached: 


everything after the first. One Perl shi ft() gets this done once 
I spl it() and reverse( ) the line. Here is a before-subroutine 
and after-subroutine look at the extracted lists. 


Before: 


Final-Recipient: 
Final-Recipient: 
Final-Recipient: 
Final-Recipient: 
Final-Recipient: 
Final-Recipient: 
Final-Recipient: 
Final-Recipient: 
Final-Recipient: 
Final-Recipient: 
Final-Recipient: 
Final-Reelpient: 
Final-Recipient: 


rfc822; taniac@netconuver.com 
jb@newnet.org 

rfc822; funyguy@severe.net 
joeblow@garbage.com 
rfc822; jeneral@fred.net 
rfc822; johnnygoode@cotnman.doh 
rfc822; zerbil@todd.ca 
goboy@betyeruncle.biz 
rfc822; saliva@spit.net 
rfc822; tenspot@sawbuck.com 
rfc822: anybody@hetmail.com 
rfc822; yermom@yerhouse.co.uk 
geffen@deuhland.bel 



Listing 1 badadd.pt 


((!/usr/bin/perl -w 

((PART ONE: locate and extract the lines containing the addresses for 
#each error code 

((Define my input file tinfile as the mail file (inbox) for the user to 

((whom all the bounces come 

Jinfile-'feedbackjBailfile"; 

tx- 0 ; 

((define a separate search for each of my 6 defined error codes 
Jsearchl - 'Status: 5.0.0'; 

$search2 - 'Status: 5.1.1’; 

$search3 - 'Status: 5.1.2’; 

$search4 * 'Status: 5.1.3'; 

Jsearch5 = 'Status: 5.1.6'; 

Jsearch 6 = 'Status: 5.1.8'; 


Final-Recipient: bobsyeruncle@nodomain.net 
Action: failed 
Status: 5.0.0 

Now I have the information 1 need to clean up my mailing list: 
the error code and the address. From here, it's a relatively simple 
task to locate this error code and then crawl up a couple of lines 
to get to the address line, and extract it to a tile for each of my 
error codes. I put these into six separate files for case of retrieval 
in case I am asked for further information at a later date. 

Listing 1 contains the script, which is split into two parts. 
The first part does the extraction from (he muilfile. The sec¬ 
ond part of the script removes extraneous information from 
my six output files so that they include only the addresses in 
a list form. When initially extracted, the lines in the tmp files 
are in one of two formats as they came in the mail header, for 
example, this: 

Final-Recipient: rfc822; man1ac@netcourrier.coni 
or this: 

Final-Redpient: maniac@netcourrien.com 

Either way, I need to get rid of everything but the address, 
which 1 do by reversing the order of the fields and dropping off 


$status=""; 

$i=Q; 

open (MAtLFILE, tinfile) [| diet“Could not open input file"); 

((Open a file for each of my searches - .tmp because what is extracted 
((still needs some processing 

open (tempi, ' >bad - 500. tmp') || diet "Coul d not open output file’’); 

open (temp2, * >bad-511.tmp *) jj diet“Could not open output file"); 

open (temp3, ‘ >bad-512. tmp *) || dieC’Could not open output file"); 

open (temp4. '>bad-513.tmp’3 jj dieC’Could not open output file"); 

open (temps, '>bad-516.tmp*) jj diet"Could not open output file"); 

open (temp 6 , '>bad-518.tmp‘) jj dieC’Could not open output file"): 

while«MAILFILE>){ 

Jaddi = 1 Final-Recipient:’; (( the Final Recipient 

(( contains the address 

Jstatus = 

if (Jstatus =~ /Jaddi/)f 
laddress = 

for (Jx*l;Jx< 3 ;Jx++){ (( look up two lines from the 

(( line containing the error code 

$_=<MAILFILE>; 

Jstatus = J_; 

if (Jstatus =~ /Jsearchl/){ 

print tempi '•laddress": 
j # end if 

if (Jstatus =~ /Jsearch 2 /)| 

print temp2 "Jaddress’': 

) /( end if 

if (Jstatus — /Ssearch3/)( 

print temp3 "Jaddress”; 

} (/ end if 

if (Jstatus =- /$search4/){ 

print temp4 "Jaddress": 

} (( end if 

if (Jstatus — /JsearchS/H 
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After: 

taniac@netconuver.com 
jb@newnet.org 
funyguy@severe.net 
joeblcw@garbage.com 
jeneral@fred.net 
johnnygood e@c o mm a n.dob 
zerbil@todd.ca 
goboy@betyerunc’ e.b'l z 
saliva@spit.net 
tenspot@sawbuck.com 
anybodyShetmai1.com 
yermom@yerhouse.co.uk 
geffen@deuhland.bel 

Armed with the six lists produced by this script (I would 
generally add a date stamp to the title of each so they are 
not overwritten). I have the information the mail list 
administrator needs to purge the master list. While this 
could be run as a weekly or bi-weekly cron. 1 ve found 
such frequency to be unnecessary, and I run it manually 
every two months. How best to present the information to 
the person or team that manages the master mailing list 
depends on the situation — for me. it's a simple lit ml report 
that 1 generate and email with a second script, For others, 
the task of managing the master list may fall to the sys 
admin in his or her role as postmaster. 


SENDMAIL SEMINARS 

Managing Internet Mail: Sendmail & DNS 
Covers sendmail 8.12 & Bind 9.2 and how to: 


• Configure sendmail & DNS as a relay, mailbox server. 

& SMTP/DNS Firewall 

• Configure sendmail to reject spam 

• Use LDAP & databases with sendmail 

• Create & debug custom semimail address rewriting rules 

• Configure Bind as a secure authoritative server 

• Secure & protect internal Bind servers 

• Customize the sendmail. cj Tile using M4 

• Route mail using DNS MX records 

• Convert existing sendmail.cj tile into M4 template 


Locations and dates for 2005: 

Schaumburg. IL 

May 17-20 

Denver, CO 

June 20-23 

Boston. MA 

July 25 - 2K 

Dallas. TX 

August 22 - 25 

Beaverton. OR 

September 2b - 29 


Course lees: US $2,800 includes seminar, seminar notes, lunch and refresh me nis. 
Attendance limited ■ Advance registration required • All major credit cards accepted 


ClaSS 1 Advanced Topics in Sendmail: Performance Tuning, 
LDAP Integration & Spam control. Details on websile. 


Call (530) 887-9990 E-mail info@harker.com 

http://www.harker.com 

HARKER SYSTEMS 

4182 Plcasanl Hill Ruud. Lincoln. CA 95648 

On-site SENDMAIL™ training and consulting 


References 

Blank-Ldelman. David. 2000. Perl for System Administration. 

Sebastopol, CA: O'Reilly & Associates. 

Costales. Bryan with Allman. Eric. 2003. Sendmail . 3rd Ed. 
Sebastopol. CA: O'Reilly & Associates. 

Jeff Harnett has been a systems administrator for six years, focusing 
mainly on Web and retail systems running Solaris. AIX. and Linux. 
Currently working on a consulting basis in Toronto, he also occasion¬ 
ally teaches Unix administration (Solaris certification track) at several 
Toronto technical institutions. He can be reached at 
se 1 vasy$@rogers. com. 



Listing 1 continued 

print temp5 "Jaddress": 

) if end if 

if ($ status — /tsearch6/){ 

print temp6 "taddress"; 

1 if end if 
] If end for 
1 if end if 
$ i ++; 

istatus-""; 
j if end while 
Close MAILFILE: 

close tempi, tempi, temp3. temp4, tempS, temp6; 

OPART TWO: remove extraneous content from each of the 6 tmp files 

$inputfi 1 e-"bad-500.trip"; if define the files and call the 

if addressonly subroutine 

toutfile="bad-500.txt"; 

Saddressonly; 

$1nputfi1e—"bad - 511.tmp"; 

*outflle-"bad-5U.txt": 

^addressonly: 

tinputfile="bad-512.trap"; 

$outfile="bad-512.txt"; 

Saddressonly; 

$inputfi1e—"bad-513. tmp”; 
toutfi1e="bad-513.txt"; 

Saddressonly; 

tinputfi1e-"bad-516.tmp"; 
toutfile="bad-516.txt": 
iaddressonly; 

tinputfile="bad-518.tmp"; 
toutfile—”bad-51S.txt": 

Saddressonly; 

sub addressonly) 

open (INFILE, $1 nputfi 1 e) || dieC'Could not open input file!"); 
open (BOUNCE. ”>toutfile") j| dieC'Could not open input file!"); 

$i=0; 

while{<INFILE>){ 

$ 1ine = $_; 

@a - split!" ", tline): if split the line into fields 

@ra - reverse{@a); if reverse the fields so the address 

if is the first field 

@sra - shift(@ra): if shift off everything after the 

if address 

print BOUNCE "@sra \n": 

(1++; if move to the next line in the tmp file 

] ((end while 
1 

close INFILE, BOUNCE; 

if One last thing - remove the temp files 

unlink("bad-500,tmp". "bad-511.tmp". "bad-512.tmp". "bad-513.tmp". \ 
"bad-516.tmp", "bad-518.tmp") or die "Couldn't unlink t!\n"; 

#end of script 
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Spam Graphing and Logging for 
SpamAssassin Rule Optimization 

James Mikusi 


D uring my tenure as a systems administrator, I've 
noticed that admins fall into two disparate groups 
based on how they approach a problem. The first 
group aggressively works toward a solution and closure to the 
problem, trying any potential change that might make the tlx. 
The other group works more methodically, making calculated 
adjustments and reversible changes. I've come to appreciate 
both groups, especially the former when it's important to just 
"get the job done", but getting a grip on spam requires the 
more deterministic approach. Counting and graphing your 
spam, for example, can help you see just how big your problem 
might be and how best to attack it. 

This article details how to gather statistics on mail that is 
filtered through SpamAssassin and how to plot those numbers 
with MRTG. This project began when I decided to learn 
exactly how much spam I received in a given period; it grew 
when I found some oddities in the SpamAssassin rules that 
matched most frequently. I should add that when 1 began this 
project I had already invested considerable time tuning 
Spam Assassin’s Bayesian database. In my opinion, this 
remains one of the strongest defenses against spam on a per¬ 
user basis, because what is spam to you is not necessarily spam 
to your neighbor. Thus, teaching SpamAssassin to recognize 
what's spam to you is important. 

On that note, you also should be aware that the imple¬ 
mentation described is designed for a single user. The 
scripts could easily be edited for use at the domain level. 
However, the objectives here are to tune SpamAssassin, 
which is difficult to do. and to make global assumptions 
about what hundreds of users might concur is spam. The 
methods described increase the effectiveness of Bayes fil¬ 
tering by finding out which rules are triggered most often. 
This is done by counting incoming spam and graphing the 
numbers. 

Two direct dependencies are used in this article's features 
— SpamAssassin and MRTG. both depending on Perl. Both 
packages are included with almost every Linux distribution, 
thus their installation will not be covered here. The projects’ 
Web sites (see the References) contain thorough documenta¬ 
tion as well. A potential, third dependency might be procmail. 
but your favorite local mail agent can be used to filter incom¬ 
ing mail through SpamAssassin. I like procmail and will 
describe how I used it. 


Getting the Statistics 

The first step in implementing this spam control suite is 
having your incoming mail filtered through SpamAssassin 
before delivery. This is where 1 use procmail. The following 
line at the start of your .procmailrc file in your home directory 
will pipe mail through SpamAssassin: 

:0fw 

| /usr/bin/spamc 

This use depends on having the spamd daemon running, 
which I highly recommend for efficiency. If, for any reason, 
running the daemon doesn't suit you, mail can alternatively 
be piped to /usr/bin/spamassassin, but this setup will spawn 
a different perl/spamassassin process for each mail. My 
home mail server runs fetch mail to get 10 mails per call, 
which would bring this machine, a PII-350, to its knees if 
called in the latter manner. 

This setup alone will do SpamAssassin’s default actions 
and tag your mail headers and prepend the mail's subject line 
with SpamAssassin's default •■***SPAM***.” While these 
tags are useful to end users, the utilities of this article depend 
on the X-Spam-Flag mail header, which contains a Yes/No 
spam assertion and SpamAssassin’s score based on its scoring 
rules. We’ll make use of these features by asking procmail to 
do a few more things with our mail. 

Although it might seem odd. we’re going to filter the 
mail through SpamAssassin a second time, but this time the 
custom script this article features makes use of the Perl 
module Mail::SpamAssassin::NoAudit, which doesn't deal 
with the full overhead of SpamAssassin. The next release of 
this project will likely eliminate this duality, so check for 
updates. The fol 1 ow i n g shou 1 d appear next in . proc mai 1 rc: 

:0c 

| .spamassassin/bin/spamassassin_stats.pl 

Also, note that the following procmail recipe was an early 
implementation of this tool and worked quite well, but then 
these responsibilities got snarfed into the above script for the 
sake of consolidation. It nicely creates two counter files and 
delivers mail to a spam mbox file and non-spam mail as usual: 
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:0 

^ X - Spam - FI a g:. *YES.* 

{ 

# deliver to spam mbox file AND incr spam counter file 
:0 c 

i echo -n . >> .spamassassin/count.spam 
:0 

mi 1 / spam 

) 

:0c 

echo -n , >> .spanassassin/count.ham 

The spamassassin_stats.pl script uses the -/.spamassassin/stats 
directory to keep its count <see Listing 1 at 
http://www.sysadiTiininag.com). There are two tiles, named 
counts.spam and counts.ham. which tally their respective mail 
types. Additionally, there are two tiles to keep track of 
Spam Assassin scores (scores, spam and sc ores, ham) and two 
directories (named "spam” and "ham”). These directories hold 
some interesting statistics — a file named for each 
SpumAssassin rule matched with its size being the count of 
matches. Thus, a simple 1 S -IS I head in the spam/ or ham/ 
subdirectories can quickly show most common characteristics 
in your spam. This feature alone may suit some admins who 
just want to quickly see some numbers, hut the graphing used 
by MRTG really adds some nice documentation of spam 
abuse. Another quick option is to point your Web server to this 
stats directory (assuming directory listings are permitted). 
Apache has linked column headers, which sort for that specific 


Depending on your influx of mail, it might be beneficial to 
reduce this frequency to dramatize your graphs. 

The spamstats.cfg file can be extended to create as many 
graphs as you need, but the file used here just graphs incoming 
spam counts and the percentage of mail that is spam. The real¬ 
ity of these graphs may be surprising. I was shocked and disap¬ 
pointed to discover that I get more than 90% spam! 

If you're familiar with MRTG. you probably know it can 
quickly be configured to graph port traffic from your routers 
or switches, as it was designed to do. However, it can also be 
extended to graph almost anything. By default. MRTG queries 
a router and expects four lines in return, of which the first two 
are the counts of inbound and outbound bytes, and the second 
two are the sysUptime and sysName MIB entries. The lirst 
tw o lines are completely arbitrary and can be used to represent 
anything. The scripts called via spamstats.cfg do just this. 
They get the numbers via file size in the stats directory tree 
and return them to MRTG — almost too easy. 

The initial versions of these scripts also maintained over¬ 
head of keeping track of the counter tiles and clearing them 
periodically, but as it turns out, MRTG takes care of main¬ 
taining a database and has features to reset counters. 
Whether you're using RRD (Round Robin Database, a pre¬ 
ferred logging mechanism for MRTG) or MRFG's default 
text database scheme, MRTG does all the work of keeping 
track of historical data. This is done by integrating new data 
into historical averages. 

From the perspective of MRTG, this is all that s needed 
to create the Yearly. Monthly, and Weekly graphs. If more 
detailed historical data is desired, it can easily be main¬ 


column. Use this to sort your stats. 

Graphing the Stats with MRTG 

As with the common use of MRTG, the mrtg binary should be 
run about every 20 to 30 minutes via cron, but we'll be using a 
custom con fig file named .spamassassin/stats/mrtg/spamcount.clg 
(see Listing 2 at http://www.sysadinirinag.coni). This will be the 
only required argument to mrtg in your cron entry: 

7/37 * * * * /usr/bin/mrtg $HOME/mrtg/spam/spamstats.cfg 


tained by a few edits to these scripts. However, the counter 
files do need to be periodically reset. The ThrcshMaxI and 
Thresh Prog I MRTG configuration options lets us set a counter 
threshold and program to reset the values, respectively. Just 
like your switch's counter registers reset when it hits the ceil¬ 
ing of a 32-bit register, we ll do the same. We’ll set the magic 
number to 1024 because a default ext2 filesystem makes use of 
a 4K block size. This is the number to which we'll configure 
Thresh Maxi and ThreshMaxO to respond. 

To finish the presentation, we'll use indexcfgmaker, a Perl 

script that's part of the MRTG dis¬ 
tribution. We can feed this script 
ihe spamstats.cfg MRTG con fig 
file as an argument, and it'll gener¬ 
ate appropriate html for an 
index.html file containing a list of 
all the monitored objects in tabular 
format with the five-minute aver¬ 
ages graphs. This provides a quick 
overview of the current status. 
Clicking on any graph will take 
you to that monitored object's full 
page with the Weekly. Mouthy, and 
Yearly graphs. 

Tuning SpamAssassin 
for Better Filtering 

Now that we can "see" our 
spam from a higher perspective, 
SpumAssassin can be tuned for 
better filtering. The default 
values that SpamAssassin gives 


Figure 1 avg_spam_score‘day 



Figure 2 myho$t_pct_spam-week 
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to rules are configured in /etc/mail/spamassassin/locul.cf. 
When i first began filtering my mail with these scripts, I 
was surprised to see how many mails scored higher than 
the Bayesian 90th percentile. By increasing the weight of 
frequent culprits in my ,spamassassin/user_prefs file, I 
also increased the number of mails matched above the 90th 
percentile. Likewise, if you find you never get any non¬ 
spam mail hitting above the 30th Bayesian percentile, you 
can comfortably set die Bayesian watermark to 70 instead 
of the default of 99. Here are some of my 
.spamassassin/user_prefs: 


# score adjustments 

score DATE_IN_FUTURE_03_06 5.0 
score INVALIDATE 3.5 
score DOMAIN_SUBJECT 2.5 

# trigger and bayesian learning thresholds 
requi redjiits 3.5 
auto_learn_threshold_spam 7 


The roots of this project began with filtering my personal mail, 
and I have been continually tempted to try these utilities at the 
server level ti haven’t yet). However, it seems most anti-spam 
whitepapers emphasize the point that Bayesian filtering is 
strongest per user. Although I would expect the graphing to be 
helpful at the server level. I would also anticipate that one 
small change to benefit one user’s spam problem might create 
false positives for another. 


Conclusion 

If you've been using MRTG to track router traffic, you’ll 
likely agree as to the convenience of seeing this information 
graphically. Many svs admins are already overtaxed with 
responsibilities, thus the more utilities we have to see what our 
system is doing, the better. And. while most of us pride our¬ 
selves in being able to find almost any system stat from the 
command line, it’s undeniably helpful to have graphical tools. 

An extended hope of mine is that this suite of scripts can 
help legislation catch up with the spam epidemic. Although 
spam provides a lot of job security to sys admins, I think we 
would all prefer to see it disappear so we could work on bigger 
and better things. I hope these graphs can he used to show 
management and politicians how badly some of us are plagued 
by spam and thereby losing productivity. Managers and politi¬ 
cians may be more receptive to statistical complaints, graphs, 
and pie charts than other forms of information. 
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